AITopics | stochastic error

Collaborating Authors

stochastic error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimally taming biases in black-box models for efficient semiparametric estimation

Gu, Yihong, Yin, Qishuo, Cai, Tianxi, Fan, Jianqing

arXiv.org Machine LearningJun-5-2026

Modern semiparametric estimation often relies on flexible black-box machine learning methods to estimate nuisance functions, raising a fundamental question: how do nuisance estimation errors propagate into inference for low-dimensional target parameters? The dominant paradigm, exemplified by double machine learning (DML), yields error bounds in which nuisance estimation errors enter multiplicatively. While widely adopted, it remains unclear whether this multiplicative-rate dependence is optimal for black-box models. In this paper, we start by revisiting the partial linear model $Y = μ_0(X)+T\cdotβ_0+\varepsilon$ under a structure-agnostic setting, where the nuisance function $μ_0$ is estimated using a generic machine learning model, with approximation error $δ^a_μ$ and stochastic error $δ_μ^s$. We show that the standard DML rate is not optimal in the regime where the auxiliary function $\mathbb{E}[T|X=x]$ cannot be consistently estimated. We propose a new estimator for $β_0$ that achieves a sharper rate of $n^{-1/2}+δ^a_μ+(δ_μ^s)^2$ and establish a matching lower bound demonstrating its optimality. Our results reveal a new principle: the first-order stochastic error of nuisance estimation can be eliminated without imposing any additional assumptions. This also leads to a revised tuning strategy favoring under-smoothing, where $δ^a_μ\asymp(δ_μ^s)^2$, rather than the classical bias-variance trade-off $δ^a_μ\asymp δ_μ^s$. Under mild additional conditions, the estimator is asymptotically normal with minimal asymptotic variance. The proposed method extends to a broad class of semi-parametric linear functional estimation problems, including average treatment effect estimation. Our results imply that popular orthogonal score methods in semiparametric estimation with black-box nuisance learners can be substantially improved.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Machine Learning

2606.06368

Genre: Research Report > New Finding (0.68)

Industry: Transportation > Air (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

DifferentiableAnnealedImportanceSampling andthePerilsofGradientNoise

Neural Information Processing SystemsFeb-10-2026, 09:00:47 GMT

Marginal likelihood (ML), sometimes calledevidence, is a central quantity in Bayesian learning as it measures how well a model can describe a particular dataset.

artificial intelligence, machine learning, transition, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Asymptotic confidence bands for centered purely random forests

Neumeyer, Natalie, Rabe, Jan, Trabs, Mathias

arXiv.org Machine LearningNov-18-2025

In a multivariate nonparametric regression setting we construct explicit asymptotic uniform confidence bands for centered purely random forests. Since the most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates, we propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. Our theoretical findings are illustrated in simulation examples.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2511.13199

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Error Analysis of Discrete Flow with Generator Matching

Wan, Zhengyan, Ouyang, Yidong, Yao, Qiang, Xie, Liyan, Fang, Fang, Zha, Hongyuan, Cheng, Guang

arXiv.org Machine LearningSep-29-2025

Discrete diffusion models have achieved significant progress in large language models [24, 42, 41, 39]. By learning the time reversal of the noising process of a continuous-time Markov chain (CTMC), the models transform a simple distribution (e.g., uniform [19, 23] and masked [26, 32, 30]) that is easy to sample to the data distribution that has discrete structures. Discrete flow models [10, 16, 31] provides a flexible framework for learning generating transition rate analogous to continuous flow matching [1, 22, 21], offering a more comprehensive family of probability paths. Recent theoretical analysis for discrete diffusion models has emerged through numerous studies [11, 40, 28, 29]. To obtain the transition rate in the reversed process, the concrete scores in these analyses are obtained by minimizing the concrete score entropy introduced in [23, 8]. In those works, the distribution errors of discrete diffusion models are divided into three parts: (a) truncation error from truncating the time horizon in the noising process; (b) concrete score estimation error; (c) discretization error from sampling algorithms. In our paper, we aim to investigate the theoretical properties of the discrete flow-based models using the generator matching training objective [18] and the uniformization sampling algorithm [11], which offers zero truncation error and discretization error.

diffusion model, international conference, transition rate, (14 more...)

arXiv.org Machine Learning

2509.21906

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

On Non-asymptotic Theory of Recurrent Neural Networks in Temporal Point Processes

Chen, Zhiheng, Fang, Guanhua, Yu, Wen

arXiv.org Machine LearningJun-2-2024

Temporal point process (TPP) is an important tool for modeling and predicting irregularly timed events across various domains. Recently, the recurrent neural network (RNN)-based TPPs have shown practical advantages over traditional parametric TPP models. However, in the current literature, it remains nascent in understanding neural TPPs from theoretical viewpoints. In this paper, we establish the excess risk bounds of RNN-TPPs under many well-known TPP settings. We especially show that an RNN-TPP with no more than four layers can achieve vanishing generalization errors. Our technical contributions include the characterization of the complexity of the multi-layer RNN class, the construction of $\tanh$ neural networks for approximating dynamic event intensity functions, and the truncation technique for alleviating the issue of unbounded event sequences. Our results bridge the gap between TPP's application and neural network theory.

approximation error, exp, neural network, (14 more...)

arXiv.org Machine Learning

2406.0063

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A kernel-based analysis of Laplacian Eigenmaps

Wahl, Martin

arXiv.org Machine LearningFeb-26-2024

Given i.i.d. observations uniformly distributed on a closed manifold $\mathcal{M}\subseteq \mathbb{R}^p$, we study the spectral properties of the associated empirical graph Laplacian based on a Gaussian kernel. Our main results are non-asymptotic error bounds, showing that the eigenvalues and eigenspaces of the empirical graph Laplacian are close to the eigenvalues and eigenspaces of the Laplace-Beltrami operator of $\mathcal{M}$. In our analysis, we connect the empirical graph Laplacian to kernel principal component analysis, and consider the heat kernel of $\mathcal{M}$ as reproducing kernel feature map. This leads to novel points of view and allows to leverage results for empirical covariance operators in infinite dimensions.

high probability, operator, probability, (17 more...)

arXiv.org Machine Learning

2402.16481

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

A Stability Principle for Learning under Non-Stationarity

Huang, Chengpiao, Wang, Kaizheng

arXiv.org Machine LearningOct-27-2023

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.

artificial intelligence, assumption 4, machine learning, (17 more...)

arXiv.org Machine Learning

2310.18304

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Biggs, Felix, Guedj, Benjamin

arXiv.org Machine LearningFeb-4-2022

The study of generalisation properties of deep neural networks is arguably one of the topics gaining most traction in deep learning theory (see, e.g., the recent surveys Kawaguchi et al., 2020; Jiang et al., 2020b). In particular, a characterisation of out-of-sample generalisation is essential to understand where trained neural networks are likely to succeed or to fail, as evidenced by the recent NeurIPS 2020 competition "Predicting Generalization in Deep Learning" (Jiang et al., 2020a). One stream of this joint effort, which the present paper contributes to, is dedicated to the study of shallow neural networks, potentially paving the way to insights on deeper architectures.

bound, non-vacuous generalisation bound, test error, (10 more...)

arXiv.org Machine Learning

2202.01627

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Quebec > Montreal (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Accelerated and instance-optimal policy evaluation with linear function approximation

Li, Tianjiao, Lan, Guanghui, Pananjady, Ashwin

arXiv.org Machine LearningDec-24-2021

We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results that match those in the i.i.d. setting up to a multiplicative factor that is proportional to the mixing time of the chain. Our theoretical guarantees of optimality are corroborated by numerical experiments.

algorithm, approximation, stochastic error, (13 more...)

arXiv.org Machine Learning

2112.13109

Country:

North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

Add feedback

A game-theoretic approach for Generative Adversarial Networks

Franci, Barbara, Grammatico, Sergio

arXiv.org Machine LearningMar-30-2020

Generative adversarial networks (GANs) are a class of generative models, known for producing accurate samples. The key feature of GANs is that there are two antagonistic neural networks: the generator and the discriminator. The main bottleneck for their implementation is that the neural networks are very hard to train. One way to improve their performance is to design reliable algorithms for the adversarial process. Since the training can be cast as a stochastic Nash equilibrium problem, we rewrite it as a variational inequality and introduce an algorithm to compute an approximate solution. Specifically, we propose a stochastic relaxed forward-backward algorithm for GANs. We prove that when the pseudogradient mapping of the game is monotone, we have convergence to an exact solution or in a neighbourhood of it.

algorithm, cost function, discriminator, (14 more...)

arXiv.org Machine Learning

2003.13637

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback